Learning with Limited Rounds of Adaptivity: Coin Tossing, Multi-Armed Bandits, and Ranking from Pairwise Comparisons

نویسندگان

  • Arpit Agarwal
  • Shivani Agarwal
  • Sepehr Assadi
  • Sanjeev Khanna
چکیده

In many learning settings, active/adaptive querying is possible, but the number of rounds of adaptivity is limited. We study the relationship between query complexity and adaptivity in identifying the k most biased coins among a set of n coins with unknown biases. This problem is a common abstraction of many well-studied problems, including the problem of identifying the k best arms in a stochastic multi-armed bandit, and the problem of top-k ranking from pairwise comparisons. An r-round adaptive algorithm for the k most biased coins problem specifies in each round the set of coin tosses to be performed based on the observed outcomes in earlier rounds, and outputs the set of k most biased coins at the end of r rounds. When r = 1, the algorithm is known as non-adaptive; when r is unbounded, the algorithm is known as fully adaptive. While the power of adaptivity in reducing query complexity is well known, full adaptivity requires repeated interaction with the coin tossing (feedback generation) mechanism, and is highly sequential, since the set of coins to be tossed in each round can only be determined after we have observed the outcomes of the coin tosses from the previous round. In contrast, algorithms with only few rounds of adaptivity require fewer rounds of interaction with the feedback generation mechanism, and offer the benefits of parallelism in algorithmic decision-making. Motivated by these considerations, we consider the question of how much adaptivity is needed to realize the optimal worst case query complexity for identifying the k most biased coins. Given any positive integer r, we derive essentially matching upper and lower bounds on the query complexity of r-round algorithms. We then show that Θ(log∗ n) rounds are both necessary and sufficient for achieving the optimal worst case query complexity for identifying the k most biased coins. In particular, our algorithm achieves the optimal query complexity in at most log∗ n rounds, which implies that on any realistic input, 5 parallel rounds of exploration suffice to achieve the optimal worst-case sample complexity. The best known algorithm prior to our work required Θ(log n) rounds to achieve the optimal worst case query complexity even for the special case of k = 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Finding a most biased coin with fewest flips

We study the problem of learning the most biased coin among a set of coins by tossing the coins adaptively. The goal is to minimize the number of tosses to identify a coin i∗ such that Pr (coin i∗ is most biased) is at least 1 − δ for any given δ. Under a particular probabilistic model, we give an optimal algorithm, i.e., an algorithm that minimizes the expected number of tosses, to learn a mos...

متن کامل

Ranked bandits in metric spaces: learning diverse rankings over large document collections

Most learning to rank research has assumed that the utility of different documents is independent, which results in learned ranking functions that return redundant results. The few approaches that avoid this have rather unsatisfyingly lacked theoretical foundations, or do not scale. We present a learning-to-rank formulation that optimizes the fraction of satisfied users, with several scalable a...

متن کامل

Building Bridges: Viewing Active Learning from the Multi-Armed Bandit Lens

In this paper we propose a multi-armed bandit inspired, pool based active learning algorithm for the problem of binary classification. By carefully constructing an analogy between active learning and multi-armed bandits, we utilize ideas such as lower confidence bounds, and self-concordant regularization from the multi-armed bandit literature to design our proposed algorithm. Our algorithm is a...

متن کامل

When can we rank well from comparisons of \(O(n\log(n))\) non-actively chosen pairs?

Ranking from pairwise comparisons is a ubiquitous problem and has been studied in disciplines ranging from statistics to operations research and from theoretical computer science to machine learning. Here we consider a general setting where outcomes of pairwise comparisons between items i and j are drawn probabilistically by flipping a coin with unknown bias Pij , and ask under what conditions ...

متن کامل

Staged Multi-armed Bandits

In conventional multi-armed bandits (MAB) and other reinforcement learning methods, the learner sequentially chooses actions and obtains a reward (which can be possibly missing, delayed or erroneous) after each taken action. This reward is then used by the learner to improve its future decisions. However, in numerous applications, ranging from personalized patient treatment to personalized web-...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2017